From Norms to Neuropsychopathology: Exploring Neuroanatomical & Neuropsychiatric Variation in Alzheimer’s Disease Through Normative Modeling
This study extends analyses conducted by Verdi et al. (2023), who utilized normative modeling techniques to delineate neuroanatomical heterogeneity in Alzheimer’s disease. We employ a similar methodological framework to supplement these insights by integrating both structural MRI data and neuropsychiatric symptom profiles. This integration allowed us to explore more deeply the neuroanatomical and neuropsychiatric underpinnings of Alzheimer’s disease across different phenotypes within our ADNI subset. This study was conducted by the Applied Neuroimaging Statistics Ressearch Laboratory at McLean Hospital & Harvard Medical School.
Author: Adrian Medina
Date: October 2, 2024
Project Overview
Analytic Background: Our study data were a subset derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) ADNI3 wave data bank. This was due to the harmonization of scanner sequence protocols across data collection sites that began during this wave. We only included subjects that had both structural MRI and PET (amyloid & tau) data collected, followed by QA of these data. This analysis leveraged a normative model developed by the Predictive Clinical Neuroscience Group at the Donders Institute and Radboud UMC, which aims to predict and stratify brain disorders on the basis of neuroimaging data. Specifically, we used ‘HBR_lifespan_36K_79sites’, which makes use of the Hierarchical Bayesian Regression algorithm trained on 37,128 subjects from 79 different collection sites, across the human lifespan.
Ideally, both adaptation and testing sets would be balanced by age, sex, and site (covariates) following something like a 60/40 or 70/30 split of healthy controls. However, given our limited sample size, we decided to keep all of our healthy control and patient data isolated.
In our analysis: The adaptation set is used to calibrate for site (onlyhealthy controls) while the testing set is used exclusively for patient-phenotyped data.
Healthy control phenotypes include ‘A-C-’ (amyloid-tau negative, cognitive impairment negative) & ‘A+C-’ (amyloid-tau positive, cognitive impairment negative).
As a consequence of simultaneous conditions, a smaller sample size & larger site numbers (59 sites), our group elected to utilize the MRI manufacturer (3 total) of the subject’s imaging data to act as a pseudo ‘site’ variable thus giving more power to viably calibrate for potential ‘site’ influences.
‘1’ = ‘GE’
‘2’ = ‘Philips’
‘3’ = ‘Siemens’
Code Workflow
Data Frame Initialization: Install/load packages needed to run any/all chunks in this code file, and load/merge dataset(s) with imaging &/or non-imaging variables of interest. The finalized, merged file is what will be used to create adaptation-testing sets in the next step.
Note: Some manual edits may be needed to match your imaging variables to the column headers listed in the normative model CSV template (listed in the Normative Modeling GUI).
Covariate Distributions, Data Splitting, & Balance Verification: Visualize distributions of the covariates of interest prior to splitting your dataset. Produce the adaptation and testing datasets needed to compute deviation scores of subject neuroimaging (structural MRI) data. Visualize the proportions of covariates of interest following the adaptation-testing split to verify balance between the two sets.
Note: Upon completion of this step, you are then able to run your sets through the normative model deviation scores computation via either the Normative Modeling GUI or Braincharts GUI (if you want to recreate the Python code yourself).
Analytic Data Preparation: Preparing analytical data for both group comparisons of cortical thickness and broader statistical analyses. Loading and cleaning study data from its CSV file, which includes renaming columns for clarity, re-coding categorical variables such as biomarkers and sites, and setting appropriate factor levels. For analyses excluding group cortical thickness comparisons, the procedure extends to managing deviation scores, generating dummy variables, eliminating unnecessary columns, and ensuring the data’s integrity for modeling. Additionally, the Destrieux atlas data is loaded from the ggseg package, with preliminary visualizations executed to confirm the structure and detail of the data. This includes plotting atlas dimensions and displaying labeled regions to verify data accuracy. These meticulous preparations ensure the data is aptly formatted and enhanced, ready for subsequent in-depth statistical analysis and visualization tasks.
Demographic Descriptives and Statistical Analysis of Cortical Thickness: Perform statistical analyses to understand the demographic distribution and cortical thickness variations across different biomarker groups. Calculate the counts for biomarkers, sex, and site categories and provides descriptive statistics for mean cortical thickness and age by biomarker group. Visualizations such as violin plots and bar charts are generated to display the distribution of age and sex across biomarker groups. Modeling cortical thickness variations, fitting linear models to predict mean cortical thickness based solely on biomarker groups and then incorporating additional covariates like age and sex. Detail the model outcomes with confidence intervals, performs Tukey HSD tests to scrutinize mean differences across groups, and visualizes these differences and their statistical significance. Visualize adjusted models’ predictions to provide a comprehensive understanding of cortical thickness influences.
Regional Analyses: Across-Groups & Pairwise Comparisons: Apply the suffix ‘_rois’ to all ROI FreeSurfer measures in your data frame, perform ANOVA analyses to extract p-values for each ROI across biomarker groups, and run FDR corrections. Further, derive F-stat values, merge and organize results for visualization, and display a heatmap of the significant F-stat values, annotated with significance levels. Remove the ‘_rois’ suffix from your ROI list and ensure that ROI names in your dataset match the Destrieux atlas nomenclature by extensively modifying and standardizing ROI names for both left and right hemispheres, then verifying and assigning corrected names in your data frame. Visualize FDR-corrected p-values using ggseg for both left and right hemispheres, employing color gradients to represent significance levels, and save the combined plots as a PDF. Prepare data subsets for pairwise t-tests between different biomarker groups and perform these tests to extract statistical measures such as t-values, p-values, and FDR corrections, then visualize significant results using a lollipop plot to highlight the differences in brain regions across comparisons. Transform ROI names in your dataset to match the Destrieux atlas nomenclature by applying a function that renames based on hemisphere specifications and then ensures consistency with atlas data, using pattern replacements and validation against the atlas’s region list. Visualize and compare the significant FDR-corrected p-values across different biomarker groups (Control vs. Alzheimer’s, Control vs. MCI, and MCI vs. Alzheimer’s) for both hemispheres using ggseg maps, adjust the color gradients to highlight significance levels, and then compile all plots into a PDF for detailed analysis.
Session Information: To enhance reproducibility, details about the working environment used for these analyses can be found below.
Data Frame Initialization
Code
# Set CRAN repository for consistent package installation, ggseg for neuroimaging visualizationsoptions(repos =c(ggseg ='https://ggseg.r-universe.dev',CRAN ='https://cloud.r-project.org'))# Install and load the pacman package for efficient package managementif (!require(pacman)) install.packages("pacman")library(pacman)# Use p_load function to install (if necessary) and load packagesp_load(dplyr, knitr, kableExtra, psych, tidyr, readr, stringr, matrixStats, ggseg, ggseg3d, ggsegExtra, ggsegDesterieux, cowplot, data.table, e1071, ggplot2, plot.matrix, proxy, RPMG, broom, gridExtra, patchwork, caret, tidyverse, fastDummies, sjPlot, ggbeeswarm, lavaan, caTools, bitops, effects, ggeffects, reshape2, ggpubr)# Specify the 'base_path' where you can find your data files, ASSUMING they're in the same directory, & set it as WDbase_path <-"~/GitHub/ADNI_NormativeModeling/data_files"setwd(base_path)# Read in foundational datasetsNIDP_DX_Dem_NP <-read_csv("NIDP_DX_Dem_NP_MRI.csv") # Data set with clinical/non-imaging variables and MRI manufacturer information per subject assuming separate from FreeSurfer datasetADNI_freesurfer <-read_csv("ADNI_lh_rh_thickness_subcort_volumes.csv") # Data set with FreeSurfer brain thickness values# Merging datasets on 'Subject_ID' with only ID values present in NIDP_DX_Dem_NPADNI_274_Merged <-merge(NIDP_DX_Dem_NP, ADNI_freesurfer, by ="Subject_ID", all.x =TRUE)# Save the newly merged dataset as a CSV filewrite_csv(ADNI_274_Merged, "ADNI_274_Merged.csv")### Manual edits were done to this spreadsheet to only leave FreeSurfer values and balancing variables for data split### FreeSurfer variables were edited to match PCN template, see GUI for details### Covariates: 'age', 'sex', 'site'# Rename finalized, merged file to 'ADNI_274_Merged_Final.csv'ADNI_274_Merged_Final <-read_csv("ADNI_274_Merged_Final.csv") # This should be the spreadsheet that will be split into adaptation and testing sets for normative modeling# Read in computed deviation scores (the file you get back after uploading adaptation and testing sets)ADNI_deviation_scores <-read_csv("ADNI_deviation_scores_BLR.csv") # This dataset contains raw subject FreeSurfer volumetric measures and their respective deviation scores for your TESTING set used in the Normative Modeling# Read in PDC deviation data setsPDC_HBR <-read_csv("PDC_HBR.csv")PDC_BLR <-read_csv("PDC_BLR.csv")
Covariate Distributions, Data Splitting, & Balance Verification
Code
# Create a bar plot for MRI manufacturers with counts displayedggplot(NIDP_DX_Dem_NP, aes(x = MRI_MANU)) +geom_bar(stat ="count", fill ="skyblue", color ="black") +geom_text(stat ="count", aes(label = ..count..), vjust =-0.5) +labs(title ="Count of MRI Manufacturers", x ="MRI Manufacturer", y ="Count") +theme_minimal()
Code
# Create separate bar plots for each manufacturerggplot(NIDP_DX_Dem_NP, aes(x = MRI_MAKE, fill = MRI_MANU)) +geom_bar(position ="dodge") +geom_text(aes(label = ..count..), stat ="count", position =position_dodge(width =0.9), vjust =-0.5) +facet_wrap(~MRI_MANU, scales ="free_x") +labs(title ="Counts of Each MRI Make Grouped by Manufacturer", x ="MRI Make", y ="Count") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for better visibility
Code
# Counting subjects by BioMarkers within each sitebio_markers_site_counts <- ADNI_274_Merged_Final %>%group_by(site, BioMarkers) %>% dplyr::summarise(Count =n(), .groups ='drop')# Plot with Site on x-axisplot_site_x <-ggplot(bio_markers_site_counts, aes(x = site, y = Count, fill = BioMarkers)) +geom_bar(stat ="identity", position =position_dodge()) +geom_text(aes(label = Count), vjust =-0.5, position =position_dodge(width =0.9)) +labs(title ="Subject Counts by BioMarkers Across Sites",x ="Site",y ="Number of Subjects",fill ="BioMarkers") +theme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =1))# Plot with BioMarkers on x-axisplot_biomarkers_x <-ggplot(bio_markers_site_counts, aes(x = BioMarkers, y = Count, fill =factor(site))) +geom_bar(stat ="identity", position =position_dodge()) +# Use position_dodge() for proper groupinggeom_text(aes(label = Count), vjust =-0.5, position =position_dodge(width =0.9)) +# Adjust text positioninglabs(title ="Subject Counts Across BioMarkers by Site",x ="BioMarkers",y ="Number of Subjects",fill ="Site") +scale_fill_brewer(palette ="Set1") +# Using a qualitative palette for distinct sitestheme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =1)) # Improve readability of x-axis labels# Optionally, print both plots to view them in the R environmentprint(plot_site_x)
Code
print(plot_biomarkers_x)
Code
# Separate the data based on 'BioMarkers'adaptation_data <- ADNI_274_Merged_Final %>%filter(BioMarkers %in%c("A-C-", "A+C-"))testing_data <- ADNI_274_Merged_Final %>%filter(BioMarkers %in%c("A-C+", "A+C+"))# Save the adaptation and testing datasets as CSV files, un-comment when v# write_csv(adaptation_data, "ADNI_175_Adaptation.csv")# write_csv(testing_data, "ADNI_99_Testing.csv")# After verifying that the grouping variable and covariates are balanced as desired (i.e., see the next chunk), you can now run them through the normative model computation via the GUI or recreate their code yourself!
Code
# Plotting the distribution of 'BioMarkers' in both datasetsbiomarker_distribution <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset, BioMarkers) %>% dplyr::summarise(Count =n(), .groups ='drop') %>%group_by(dataset) %>%mutate(Percentage = (Count /sum(Count)) *100)ggplot(biomarker_distribution, aes(x = BioMarkers, y = Percentage, fill = dataset)) +geom_bar(stat ="identity", position =position_dodge()) +labs(title ="Distribution of Biomarker", x ="Biomarker", y ="Percentage (%)") +theme_minimal()
Code
# Plotting the distribution of 'site' in both datasetssite_distribution <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset, site) %>% dplyr::summarise(Count =n(), .groups ='drop') %>%group_by(dataset) %>%mutate(Percentage = (Count /sum(Count)) *100)ggplot(site_distribution, aes(x = site, y = Percentage, fill = dataset)) +geom_bar(stat ="identity", position =position_dodge()) +labs(title ="Distribution of Site", x ="Site", y ="Percentage (%)") +theme_minimal()
Code
# Plotting the distribution of 'sex' in both datasetssex_distribution <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset, sex) %>% dplyr::summarise(Count =n(), .groups ='drop') %>%group_by(dataset) %>%mutate(Percentage = (Count /sum(Count)) *100)ggplot(sex_distribution, aes(x = sex, y = Percentage, fill = dataset)) +geom_bar(stat ="identity", position =position_dodge()) +labs(title ="Distribution of Sex", x ="Sex", y ="Percentage (%)") +theme_minimal()
Code
# Calculating and comparing the average ageage_summary <-bind_rows(mutate(adaptation_data, dataset ="Adaptation"),mutate(testing_data, dataset ="Testing")) %>%group_by(dataset) %>% dplyr::summarise(Average_Age =mean(age, na.rm =TRUE), .groups ='drop')print(age_summary)
# Read in Destrieux atlas datadata("desterieux", package ="ggseg") # Note: the ggseg package adds an extra 'e' in the spelling of the name Destrieux, so it's NOT a typo# Assign atlas data to data frames in local environmentdesterieux_dims <- desterieux$datadesterieux <- desterieux# Plot simple atlas features to test data frame with dimension dataplot(desterieux_dims) +theme(legend.position ="bottom",legend.text =element_text(size =7)) +guides(fill =guide_legend(ncol =3))
NULL
Code
# Plot atlas ROIs with labels to test data frame with complete vectorsggplot() +geom_brain(atlas = desterieux)
Code
# Rename data frame for efficient code modeling, using dataset with volumetric FreeSurfer measures for ALL subjectsdf_merged <- ADNI_274_Merged_Final # to be used ONLY for group cortical thickness comparisons# Rename biomarker variabledf_merged <- dplyr::rename(df_merged, biomarker ="BioMarkers")# Specifying the order of the first few columns for easier viewingdesired_order <-c("Subject_ID", "age", "sex", "site", "biomarker")# Append the remaining column names that are not specified in desired_orderremaining_cols <-setdiff(names(df_merged), desired_order)# Combine the specified order with the remaining columnsnew_order <-c(desired_order, remaining_cols)# Reorder the columns in df according to new_orderdf_merged <- df_merged[, new_order]# Recode biomarker variabledf_merged$biomarker <-recode(df_merged$biomarker,"A-C-"="HC","A+C-"="HC","A-C+"="MCI","A+C+"="AD")df_merged$biomarker <-as.factor(df_merged$biomarker)df_merged$biomarker <-factor(df_merged$biomarker, levels =c("HC", "MCI", "AD")) # explicitly set the reference level# Recode site variabledf_merged$site <-as.factor(df_merged$site)df_merged$site <-gsub("1", "GE", df_merged$site)df_merged$site <-gsub("2", "Philips", df_merged$site)df_merged$site <-gsub("3", "Siemens", df_merged$site)# Recode sex variable, 0 = females 1 = malesdf_merged$sex <-factor(df_merged$sex)df_merged$sex <-gsub("0", "Female", df_merged$sex)df_merged$sex <-gsub("1", "Male", df_merged$sex)
Demographic Descriptives and Statistical Analysis of Cortical Thickness
Code
# Count of each variable of interesttable(df_merged$biomarker)
HC MCI AD
175 40 59
Code
table(df_merged$sex)
Female Male
153 121
Code
table(df_merged$site)
GE Philips Siemens
49 48 177
Code
# Cross-Tabulation of average thickness and biomarker groupdescribeBy(df_merged$Mean_Thickness, df_merged$biomarker)
Descriptive statistics by group
group: HC
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 175 2.43 0.08 2.43 2.43 0.07 2.17 2.65 0.48 -0.15 0.12 0.01
------------------------------------------------------------
group: MCI
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 40 2.4 0.08 2.41 2.4 0.07 2.24 2.59 0.35 -0.1 -0.28 0.01
------------------------------------------------------------
group: AD
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 59 2.35 0.09 2.34 2.35 0.1 2.16 2.55 0.39 -0.08 -0.83 0.01
Code
d<-(describeBy(df_merged$Mean_Thickness, df_merged$biomarker))# Cross-Tabulation of age and biomarker groupdescribeBy(df_merged$age, df_merged$biomarker)
Descriptive statistics by group
group: HC
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 175 70.1 5.94 69 69.81 5.93 55 90 35 0.53 0.66 0.45
------------------------------------------------------------
group: MCI
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 40 71.97 8.81 72 72.16 8.9 56 88 32 -0.22 -0.87 1.39
------------------------------------------------------------
group: AD
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 59 72.25 7.73 73 72.43 7.41 55 89 34 -0.19 -0.33 1.01
Code
mean(df_merged$age)
[1] 70.83942
Code
sd(df_merged$age)
[1] 6.871648
Code
# Generate the violin plot of age stratified by phenotype categoriesggplot(df_merged, aes(x = biomarker, y = age, fill = biomarker)) +geom_violin() +labs(title ="Distribution of Age Stratified by Phenotype Groups",x ="Phenotype Group", y ="Age") +theme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =1)) # Improve readability of x-axis labels
Code
# Cross-Tabulation of sex & biomarker, visualizationggplot(df_merged, aes(x = biomarker, fill = sex)) +geom_bar(position ="dodge") +labs(title ="Distribution of Sex Across Phenotype Groups",x ="Phenotype Group",y ="Count",fill ="Sex") +theme_minimal()
Code
# Create the summary Mean_Thickness data framefiltered_MT_summary <- df_merged %>%group_by(biomarker) %>% dplyr::summarise(score_mean =mean(Mean_Thickness, na.rm =TRUE),n =n(), # Sample size for each groupsem =sd(Mean_Thickness, na.rm =TRUE) /sqrt(n()),.groups ='drop' )# Create the rain cloud plot with separate componentsggplot(df_merged, aes(x = biomarker, y = Mean_Thickness, fill = biomarker, color = biomarker)) + PupillometryR::geom_flat_violin(adjust =1.5, trim =FALSE, alpha =0.5, position =position_nudge(x =0.2, y =0), colour =NA) +geom_point(aes(x =as.numeric(biomarker)-0.25, y = Mean_Thickness, colour = biomarker), position =position_jitter(width = .05), size = .5, shape =20) +geom_boxplot(outlier.shape =NA, alpha =0.5, width =0.25, position =position_dodge(width =0.3), colour ="black") +geom_point(data = filtered_MT_summary, aes(x =factor(biomarker), y = score_mean), shape =18, position =position_nudge(x =0.2)) +geom_errorbar(data = filtered_MT_summary, aes(x =factor(biomarker), y = score_mean, ymin = score_mean - sem, ymax = score_mean + sem), width =0.05, position =position_nudge(x =0.2)) +labs(title ="Distribution of Mean Cortical Thickness Stratified by Phenotype Groups", y ="Score", x ="Biomarker", fill ="Phenotype Group", # Legend title for fillcolor ="Phenotype Group"# Legend title for color ) +theme_minimal() +theme(legend.position ="right",legend.title =element_text(face ="bold") # Make legend title bold )
Code
# Fit a linear model of Mean_Thickness as a function of biomarker and store it in s_2s_2 <-lm(Mean_Thickness ~ biomarker, data = df_merged)# Create a table for the linear model with confidence intervalstab_model(s_2, title ="Linear Model: Average Cortical Thickness as a Function of Biomarker")
Linear Model: Average Cortical Thickness as a Function of Biomarker
Mean Thickness
Predictors
Estimates
CI
p
(Intercept)
2.43
2.42 – 2.44
<0.001
biomarker [MCI]
-0.03
-0.06 – 0.00
0.079
biomarker [AD]
-0.08
-0.10 – -0.05
<0.001
Observations
274
R2 / R2 adjusted
0.126 / 0.119
Code
# Perform Tukey's Honest Significant Difference test and store resultstukey_results_s2 <-TukeyHSD(aov(Mean_Thickness ~ biomarker, data = df_merged))print(tukey_results_s2)
Tukey multiple comparisons of means 95% family-wise confidence level
Fit: aov(formula = Mean_Thickness ~ biomarker, data = df_merged)
tukey_data_s2 <-as.data.frame(tukey_results_s2$biomarker)tukey_data_s2$Comparison <-rownames(tukey_data_s2)tukey_data_s2$significant <-ifelse(tukey_data_s2$`p adj`<0.05, "Significant", "Not Significant")# Creating the plot of the Tukey HSD test with significance indicationggplot(tukey_data_s2, aes(y = Comparison, xmin = lwr, xmax = upr, x = diff)) +geom_vline(xintercept =0, linetype ="dashed", color ="black") +geom_errorbarh(aes(height =0.2, color = significant)) +geom_point(aes(x = diff, color = significant), size =2) +scale_color_manual(values =c("Significant"="red", "Not Significant"="blue")) +labs(title ="Tukey HSD Test Results for Mean Cortical Thickness by Biomarker",x ="Differences in Mean Levels of Biomarker",y ="Comparison",color ="P-value Significance") +theme_minimal()
Code
# Fit a linear model of Mean_Thickness as a function of biomarker with added covariatess_1<-lm(Mean_Thickness ~ biomarker + age + sex, data = df_merged)# Create a table for the linear model including covariates with confidence intervalstab_model(s_1, title ="Enhanced Linear Model: Average Cortical Thickness as a Function of Biomarker, Adjusted for Age and Sex")
Enhanced Linear Model: Average Cortical Thickness as a Function of Biomarker, Adjusted for Age and Sex
Mean Thickness
Predictors
Estimates
CI
p
(Intercept)
2.58
2.48 – 2.68
<0.001
biomarker [MCI]
-0.02
-0.05 – 0.01
0.221
biomarker [AD]
-0.07
-0.10 – -0.05
<0.001
age
-0.00
-0.00 – -0.00
0.007
sex [Male]
-0.02
-0.04 – 0.00
0.119
Observations
274
R2 / R2 adjusted
0.159 / 0.147
Code
# Plot multiple regression modelggpredict(s_1, c(terms ="age", "biomarker", "sex")) |>plot() +labs(title ="Average Cortical Thickness as a Function of Biomarker, Adjusted for Age and Sex",x ="Age",y ="Mean Cortical Thickness (mm)",caption ="Note: 'biomarker' factors, 'HC' = Healthy Control; 'MCI' = Mild Cognitive Impairment; & 'AD' = Alzheimer's Disease.")
# Apply suffix to all ROI FreeSurfer measuresdf.rois <- df_merged %>%rename_at(vars((6:153)), ~paste0(., '_rois'))# Run ANOVA analyses for each ROI across biomarker group and extract just the p-valuesdf.stats <-as.data.frame(sapply(X = df.rois[,grep("_rois", names(df.rois),value = T)], FUN =function(x) summary(aov(x ~ df.rois$biomarker))[[1]][["Pr(>F)"]][1]))# Rename columns in ANOVA output data framenames(df.stats) <-"p_value"setDT(df.stats, keep.rownames ="ROI")# Verify that changes were made via the first 20 ROIshead(df.stats, 20)
### Repeat steps to now obtain the F-stat values# Run ANOVA analyses for each ROI across biomarker group and extract just the F-stat valuesdf.f_stats <-as.data.frame(sapply(X = df.rois[,grep("_rois", names(df.rois),value = T)], FUN =function(x) summary(aov(x ~ df.rois$biomarker))[[1]][["F value"]][1]))# Rename columns in ANOVA output data framenames(df.f_stats) <-"f_stat"setDT(df.f_stats, keep.rownames ="ROI")# Verify that changes were made via the first 20 ROIshead(df.f_stats, 20)
# Assign significance levelsdf.stats$Significance <-ifelse(df.stats$FDR.pvalue <0.001, '***',ifelse(df.stats$FDR.pvalue <0.01, '**',ifelse(df.stats$FDR.pvalue <0.05, '*', '')))# Merge data frames for visualizationdf.stats_merged <-merge(df.stats, df.f_stats, by ="ROI")# Melt the data for plottingdf.melted <-melt(df.stats_merged, id.vars ="ROI", variable.name ="Statistic", value.name ="Value")# Ensure that 'Value' is numericdf.melted$Value <-as.numeric(as.character(df.melted$Value))# Ensure df.stats and df.melted are merged to filter and sortdf.melted <- df.melted %>%filter(Statistic =="f_stat") %>%left_join(df.stats %>%select(ROI, Significance, FDR.pvalue), by ="ROI") %>%filter(FDR.pvalue <0.05) %>%arrange(match(Significance, c("", "*", "**", "***"))) # Order by significance levels# Remove suffix from ROI listdf.melted$ROI <-gsub("_rois", "", df.melted$ROI)# Define group positions and labelsgroup_positions <-data.frame(start =c(1, 15, 30), # example start positionsend =c(14, 29, 51), # example end positionslabel =c("*", "**", "***") # example labels for significance)# Create the heatmapheatmap_plot <-ggplot(df.melted, aes(x = ROI, y = Statistic, fill = Value)) +geom_tile() +scale_fill_gradient(low ="blue", high ="red") +labs(title ="Heatmap of ANOVA Statistics Across ROIs", x ="Region of Interest", y ="F-Statistic") +theme_minimal() +theme(axis.text.x =element_text(angle =90, hjust =1))# Add custom brackets using annotationsfor (i in1:nrow(group_positions)) { heatmap_plot <- heatmap_plot +annotate("segment", x = group_positions$start[i], xend = group_positions$end[i], y =Inf, yend =Inf, yjust =1.5, color ="black", size =0.5) +annotate("text", x = (group_positions$start[i] + group_positions$end[i]) /2, y =Inf, label = group_positions$label[i], vjust =2, hjust =0.5)}print(heatmap_plot)
Code
# Remove suffix from ROI listdf.stats$ROI <-as.data.frame(gsub("_rois", "", df.stats$ROI))### Rename ROIs to assign nomenclature consistent with the 'desterieux' atlas package# Left hemisphere ROIsleft.df.stats <- df.stats %>%filter(str_detect(ROI, "L_"))x <-gsub("L_", "", left.df.stats$ROI)y <-gsub("G&S_", "G and S ", x)z <-gsub("G_", "G ", y)z1 <-gsub("S_", "S ", z)z2 <-gsub("_", " ", z1)z3 <-gsub(" bin", "", z2)z4 <-gsub("\\.", " ", z3)z5 <-gsub("cingul ", "cingul-", z4)z6 <-gsub("Mid ", "Mid-", z5)z7 <-gsub("Post ", "Post-", z6)z8 <-gsub("inf ", "inf-", z7)z9 <-gsub("med ", "med-", z8)z10 <-gsub("sup ", "sup-", z9)z11 <-gsub("Fis ant ", "Fis-ant-", z10)z12 <-gsub("precentral ", "precentral-", z11)z13 <-gsub("Fis pos ", "Fis-pos-", z12)z14 <-gsub("lg&S", "lg and S", z13)z15 <-gsub("oc temp", "oc-temp", z14)z16 <-gsub("sup and transversal", "sup-transversal", z15)z17 <-gsub("orbital H Shaped", "orbital-H Shaped", z16)z18 <-gsub("oc sup&transversal", "oc sup and transversal", z17)z19 <-gsub("prim Jensen", "prim-Jensen", z18)z20 <-gsub("S oc-temp med&Lingual", "S oc-temp med and Lingual", z19)z21 <-gsub("lat fusifor", "lat-fusifor", z20)z22 <-gsub("middle&Lunatus", "middle and Lunatus", z21)z23 <-gsub("intrapariet&P trans", "intrapariet and P trans", z22)renamed_ROIs <-gsub("Lat Fis post", "Lat Fis-post", z23)desterieux_ROIs <-as.data.frame(desterieux_dims %>%filter(hemi =="left"))$regioncompare_lists <-cbind(sort(renamed_ROIs), sort(unique(desterieux_ROIs)))list_matches <- compare_lists[,1] %in% compare_lists[,2]compare_lists[!list_matches,]
[,1] [,2]
Code
## if no mismatches, than add to data.frame as regionleft.df.stats$region <- renamed_ROIs# Right hemisphere ROIsright.df.stats <- df.stats %>%filter(str_detect(ROI, "R_"))x <-gsub("R_", "", right.df.stats$ROI)y <-gsub("G&S_", "G and S ", x)z <-gsub("G_", "G ", y)z1 <-gsub("S_", "S ", z)z2 <-gsub("_", " ", z1)z3 <-gsub(" bin", "", z2)z4 <-gsub("\\.", " ", z3)z5 <-gsub("cingul ", "cingul-", z4)z6 <-gsub("Mid ", "Mid-", z5)z7 <-gsub("Post ", "Post-", z6)z8 <-gsub("inf ", "inf-", z7)z9 <-gsub("med ", "med-", z8)z10 <-gsub("sup ", "sup-", z9)z11 <-gsub("Fis ant ", "Fis-ant-", z10)z12 <-gsub("precentral ", "precentral-", z11)z13 <-gsub("Fis pos ", "Fis-pos-", z12)z14 <-gsub("lg&S", "lg and S", z13)z15 <-gsub("oc temp", "oc-temp", z14)z16 <-gsub("sup and transversal", "sup-transversal", z15)z17 <-gsub("orbital H Shaped", "orbital-H Shaped", z16)z18 <-gsub("oc sup&transversal", "oc sup and transversal", z17)z19 <-gsub("prim Jensen", "prim-Jensen", z18)z20 <-gsub("S oc-temp med&Lingual", "S oc-temp med and Lingual", z19)z21 <-gsub("lat fusifor", "lat-fusifor", z20)z22 <-gsub("middle&Lunatus", "middle and Lunatus", z21)z23 <-gsub("intrapariet&P trans", "intrapariet and P trans", z22)renamed_ROIs <-gsub("Lat Fis post", "Lat Fis-post", z23)desterieux_ROIs <-as.data.frame(desterieux_dims %>%filter(hemi =="right"))$regioncompare_lists <-cbind(sort(renamed_ROIs), sort(unique(desterieux_ROIs)))list_matches <- compare_lists[,1] %in% compare_lists[,2]compare_lists[!list_matches,]
[,1] [,2]
Code
## if no mismatches, than add to data.frame as regionright.df.stats$region <- renamed_ROIs
# Convert the data to a suitable format for ggplotcorrelation_df$Region <-factor(correlation_df$Region, levels = correlation_df$Region)# Create a scatter plot for each region showing the correlation values# Assuming 'merged_data' contains corresponding columns for HBR and BLR scoresp <-ggplot(correlation_df, aes(x = Region, y = Correlation)) +geom_point(stat ="identity") +geom_smooth(method ="lm", color ="blue") +ylim(0.8, 1) +labs(title ="Correlation of Deviation Scores Across Brain Regions",x ="Brain Region", y ="Correlation Coefficient") +theme_minimal() +theme(axis.text.x =element_text(angle =90, hjust =1, vjust =0.5, size =5.5))# Print the plotprint(p)
`geom_smooth()` using formula = 'y ~ x'
Code
# Adding a new column for significance level based on p-valuescorrelation_df$Significance <-ifelse(correlation_df$P_Value <0.001, '***',ifelse(correlation_df$P_Value <0.01, '**',ifelse(correlation_df$P_Value <0.05, '*', 'ns')))# Create the lollipop plotggplot(correlation_df, aes(x =reorder(Region, Correlation), y = Correlation, color = Significance)) +geom_segment(aes(xend = Region, yend =0.8), size =0.25) +# Create lines from 0 to the correlation valuegeom_point(size =1) +# Add points for each correlation valuescale_color_manual(values =c('***'='red', '**'='orange', '*'='blue', 'ns'='grey')) +# Assign colors based on significancelabs(title ="Correlation and Significance of Brain Regions: Hierarchical vs. Linear Bayesian Regression",x ="Brain Region",y ="Correlation Coefficient",color ="Significance Level" ) +theme_minimal() +theme(legend.position ="top", # Move the legend to the top of the plotaxis.text.x =element_text(angle =90, hjust =1, vjust =0.5, size =5.5), # Improve readability of x-axis labelslegend.title =element_text(face ="bold") # Bold the legend title for emphasis )
Z-Score Statistics Pipeline
Code
# Use dummy_cols to create dummy variables for the 'BioMarkers' columnADNI_deviation_scores <-dummy_cols(ADNI_deviation_scores, select_columns ="BioMarkers", remove_first_dummy =TRUE, remove_selected_columns =TRUE)# Rename data frame for efficient code modeling, this is the data frame that will be used for all other analyses EXCEPT group cortical thickness comparisonsdf <- ADNI_deviation_scores# Omit the 'index' column from the data frame, unnecessary variabledf <- df[, -which(names(df) =="index")]# Rename biomarker variabledf <- dplyr::rename(df, biomarker =`BioMarkers_A+C+`)# Specifying the order of the first few columns for easier viewingdesired_order <-c("Subject_ID", "age", "sex", "site", "biomarker")# Append the remaining column names that are not specified in desired_orderremaining_cols <-setdiff(names(df), desired_order)# Combine the specified order with the remaining columnsnew_order <-c(desired_order, remaining_cols)# Reorder the columns in df according to new_orderdf <- df[, new_order]# List of columns to exclude from the ROI listdesired_order <-c("Subject_ID", "age", "sex", "site", "biomarker")# Extract column names, remove those in desired_order, and remove z-score suffixesROI <-names(df)[!names(df) %in% desired_order &!grepl("_Z_predict", names(df))]# Further clean the ROI names to ensure no duplicates from z-score namesROI <-unique(gsub("_Z_predict", "", ROI))# Recode biomarker variabledf$biomarker <-as.factor(df$biomarker)df$biomarker <-gsub("0", "MCI", df$biomarker)df$biomarker <-gsub("1", "AD", df$biomarker)df$biomarker <-factor(df$biomarker, levels =c("MCI", "AD")) # explicitly set the reference level# Recode site variabledf$site <-as.factor(df$site)df$site <-gsub("1", "GE", df$site)df$site <-gsub("2", "Philips", df$site)df$site <-gsub("3", "Siemens", df$site)# Recode sex variable, 0= females 1= malesdf$sex <-factor(df$sex)df$sex <-gsub("0", "Female", df$sex)df$sex <-gsub("1", "Male", df$sex)
Code
# Establish outlier thresholdoutlier_threshold <--1.96# bottom 2.5%, as used in the Verdi et al., 2023 paper# Apply threshold to z-score data to create dummy columnsdf3 <-as.data.frame(ifelse(df[,6:153] < outlier_threshold,1,0))# Rename all binarized columns to have the suffix "_bin"df3 <- df3 %>%rename_all(paste0, "_bin")# Sum the dummy column values to create a total outlier scoredf$total_outlier_score <-rowSums(df3)# Merge the two data framesdf <-cbind(df, df3)# Remove df3 from your R environment as it's no longer neededrm(df3)# Create the box plot with colored outlines and filled pointsggplot(df, aes(x = biomarker, y = total_outlier_score, color = biomarker, fill = biomarker)) +geom_boxplot(outlier.shape =NA, fill =NA, size =1) +# Draw box plots with no fill, only outlinesgeom_jitter(width =0.2, size =2, shape =21) +# Add jittered points with the same colorlabs(x ="Biomarker", y ="Total Outlier Score") +theme_minimal() +theme(legend.position ="right")
Code
# Calculate descriptives of total outlier score grouped by biomarker groupdescribeBy(df$total_outlier_score, df$biomarker, IQR = T)
Descriptive statistics by group
group: MCI
vars n mean sd median trimmed mad min max range skew kurtosis se IQR
X1 1 40 5.3 6.79 2 4 2.97 0 28 28 1.59 1.78 1.07 7
------------------------------------------------------------
group: AD
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 59 11.83 15.46 6 8.9 7.41 0 66 66 1.85 2.86 2.01
IQR
X1 12.5